Skip to content

CDRIVER-4689 Additional partial implementation of OIDC authentication #2018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 184 commits into from

Conversation

ghost
Copy link

@ghost ghost commented May 19, 2025

This is my attempt to revive a patch stack that Kyle Kloberdanz was working on about 1 year ago. Kyle has since left MongoDB. Unrelatedly, I'm also leaving. (To protest the "AI" hype here.) My last day will be this Wednesday May 21st and I'm losing access to my corporate accounts. This GitHub account will be deleted before I lose access, so if this PR won't be merged by then someone will need to copy the branch if the work is to be continued.

The commit stack is a bit harrowing and I would not recommend reviewing commit-by-commit. My process here was to (very interactively) rebase an old copy of Kyle's branch on main, while editing out changes that conflicted with the intervening work on OIDC callbacks. After the rebase, I added commits intended to fix specific problems as described in the messages.

What's maybe good?

  • Lots of the work here was necessary plumbing to support OIDC speculative auth
  • We try to handle OIDC re-authentication replies by doing a retry
  • There's an OIDC credential cache, with invalidation rules that try to follow the auth spec
  • Some refactoring (_mongoc_sasl_run_command)
  • Some new type checks for the unified test runner
  • Added and documented mongoc_oidc_callback_copy

What's a problem:

  • Evergreen changes add an "example-task" that was clearly a work in progress
  • New set_oidc_callback functions are missing documentation
  • Incomplete thread safety work: set_oidc_callback functions have no safety checks or locking, meanwhile we have a global mutex (_oidc_callback_mutex) that seems unjustified.
  • Changes intending to zero sensitive memory before free are incomplete (Sensitive data lingers in bson_t, and passes through growable buffers)
  • Might invalidate the wrong cached token (see TODO comment)
  • OIDC support in test framework is headed in a brittle direction, without a single source of truth for the overall intended authentication mode (see the various places where mechanism is checked, and everywhere we use the "OIDC" environment variable)
  • I'm not sure the choice of ownership rules for oidc_callback_t make sense: The call sites for set_oidc_callback here would all be improved by having set_oidc_callback take ownership of the callback_t instead of copying it.

What's just incomplete:

  • Testing still needs lots of work, especially on the evergreen infrastructure side.
  • "ENVIRONMENT" options are not yet hooked up, and the environment-specific callbacks are still just stubs.

@ghost ghost self-requested a review as a code owner May 19, 2025 16:15
@ghost ghost requested review from rcsanchez97, kevinAlbs and eramongodb and removed request for rcsanchez97 May 19, 2025 16:15
@ghost
Copy link
Author

ghost commented May 20, 2025

Note, I've been debugging the mock-server-test failure but I'm not confident I'll be able to provide a solution before my accounts lock at some unspecified time.

The failure is in _test_topology_scanner: It calls test_framework_get_uri to set up the topology scanner (mongoc_topology_scanner_new) before any mock servers are started, and without knowledge of the mock server ports. It will try to connect to the default test server, to check whether it's a single server or a member of a replica set. In the mock-server-test configuration, my understanding so far (though I have not verified this) is that it's intended to run without a default server, so this approach will fail.

@ghost
Copy link
Author

ghost commented May 21, 2025

I won't have time to test and push a fix for the _test_topology_scanner failure, but for whoever works on this next:

  • The "uri" parameter for mongoc_topology_scanner_new is to blame. Prior to this PR, creating a topology scanner could succeed with a NULL uri. It isn't possible to create a correct URI before starting the mock servers, so the temporary URI created here to avoid passing in NULL is not going to be correct.
  • I added the fake uri while debugging the older patch stack from Kyle, without realizing the implications of that in the particular context of mock server testing. Without the fake uri, I was hitting a segfault due to the parameter not actually being optional.
  • This ambiguity around optional-ness of 'uri' is why I added explicit assertions.
  • The actual site where 'uri' is referenced (causing that segfault) was IIRC a new "hello" call added to determine if we are connecting to a replicaset, but I no longer have the correct backtrace handy and reading this PR diff isn't jogging my memory. I'll attach this if I find it in time.
  • Now that I know there's a good reason for 'uri' to be optional, I'd modify the parameter assertions accordingly, revert the temporary URI i added in _test_topology_scanner, and fix the underlying added 'uri' reference so that we can handle the case where the scanner must be started before a correct URI is known.

@ghost
Copy link
Author

ghost commented May 21, 2025

With this patch:

diff --git a/src/libmongoc/src/mongoc/mongoc-topology-scanner.c b/src/libmongoc/src/mongoc/mongoc-topology-scanner.c
index 47f290da5..be1be94a1 100644
--- a/src/libmongoc/src/mongoc/mongoc-topology-scanner.c
+++ b/src/libmongoc/src/mongoc/mongoc-topology-scanner.c
@@ -444,7 +444,7 @@ mongoc_topology_scanner_new (const mongoc_uri_t *uri,
                              void *data,
                              int64_t connect_timeout_msec)
 {
-   BSON_ASSERT_PARAM (uri);
+   BSON_OPTIONAL_PARAM (uri);
    BSON_ASSERT_PARAM (topology_id);
    BSON_ASSERT_PARAM (log_and_monitor);
    BSON_ASSERT_PARAM (setup_err_cb);
diff --git a/src/libmongoc/tests/test-mongoc-topology-scanner.c b/src/libmongoc/tests/test-mongoc-topology-scanner.c
index f47799ff0..70cb0bbad 100644
--- a/src/libmongoc/tests/test-mongoc-topology-scanner.c
+++ b/src/libmongoc/tests/test-mongoc-topology-scanner.c
@@ -71,15 +71,13 @@ _test_topology_scanner (bool with_ssl)
    mongoc_log_and_monitor_instance_t log_and_monitor;
    mongoc_log_and_monitor_instance_init (&log_and_monitor);

-   mongoc_uri_t *uri = test_framework_get_uri ();
-   mongoc_topology_scanner_t *topology_scanner = mongoc_topology_scanner_new (uri,
+   mongoc_topology_scanner_t *topology_scanner = mongoc_topology_scanner_new (NULL,
                                                                               &topology_id,
                                                                               &log_and_monitor,
                                                                               test_topology_scanner_setup_err_cb,
                                                                               test_topology_scanner_helper,
                                                                               &finished,
                                                                               TIMEOUT);
-   mongoc_uri_destroy (uri);

 #ifdef MONGOC_ENABLE_SSL
    if (with_ssl) {

This backtrace shows the problem (which at this point is no longer a segfault, but a parameter assert deeper in the stack)

The parameter: uri, in function _mongoc_topology_scanner_get_speculative_auth_mechanism, cannot be NULL

Thread 1 "test-libmongoc" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at ./nptl/pthread_kill.c:44
#1  0x00007ffff7925f4f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007ffff78d6fb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff78c1472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00005555557bf7e7 in _bson_assert_failed_on_param (param=param@entry=0x5555558594d6 "uri",
    func=func@entry=0x5555558b7d00 <__func__.32> "_mongoc_topology_scanner_get_speculative_auth_mechanism")
    at /home/micah/mongo-c-driver/src/libbson/src/bson/bson-macros.h:226
#5  0x00005555557bfbf3 in _mongoc_topology_scanner_get_speculative_auth_mechanism (uri=0x0)
    at /home/micah/mongo-c-driver/src/libmongoc/src/mongoc/mongoc-topology-scanner.c:156
#6  _mongoc_topology_scanner_get_speculative_auth_mechanism (uri=<optimized out>)
    at /home/micah/mongo-c-driver/src/libmongoc/src/mongoc/mongoc-topology-scanner.c:154
#7  0x00005555557c177d in _begin_hello_cmd (topology=topology@entry=0x7fffffffc9b0, node=node@entry=0x555555f72500,
    stream=stream@entry=0x0, is_setup_done=is_setup_done@entry=false, dns_result=dns_result@entry=0x555555e34bb0,
    initiate_delay_ms=initiate_delay_ms@entry=0, use_handshake=true)
    at /home/micah/mongo-c-driver/src/libmongoc/src/mongoc/mongoc-topology-scanner.c:406
#8  0x00005555557c1ac2 in mongoc_topology_scanner_node_setup_tcp (topology=topology@entry=0x7fffffffc9b0,
    node=node@entry=0x555555f72500, error=error@entry=0x555555f72780)
    at /home/micah/mongo-c-driver/src/libmongoc/src/mongoc/mongoc-topology-scanner.c:947
#9  0x00005555557c1f2b in mongoc_topology_scanner_node_setup (topology=0x7fffffffc9b0, node=0x555555f72500,
    error=0x555555f72780) at /home/micah/mongo-c-driver/src/libmongoc/src/mongoc/mongoc-topology-scanner.c:1096
#10 0x00005555557c2082 in mongoc_topology_scanner_start (topology=topology@entry=0x7fffffffc9b0,
    obey_cooldown=obey_cooldown@entry=false)
    at /home/micah/mongo-c-driver/src/libmongoc/src/mongoc/mongoc-topology-scanner.c:1212
#11 0x00005555556cc657 in _test_topology_scanner (with_ssl=false)
    at /home/micah/mongo-c-driver/src/libmongoc/tests/test-mongoc-topology-scanner.c:114
#12 0x00005555557159e7 in TestSuite_RunTest (suite=suite@entry=0x7fffffffdf40, test=test@entry=0x555555b35900,
    count=count@entry=0x7fffffffdf0c) at /home/micah/mongo-c-driver/src/libmongoc/tests/TestSuite.c:608
#13 0x0000555555717406 in TestSuite_RunAll (suite=0x7fffffffdf40)
    at /home/micah/mongo-c-driver/src/libmongoc/tests/TestSuite.c:1039
#14 TestSuite_Run (suite=suite@entry=0x7fffffffdf40) at /home/micah/mongo-c-driver/src/libmongoc/tests/TestSuite.c:1084
#15 0x0000555555566568 in main (argc=<optimized out>, argv=<optimized out>)
    at /home/micah/mongo-c-driver/src/libmongoc/tests/test-libmongoc-main.c:162

Note where we require a URI to test the auth mechanism. I haven't had a chance yet to look closely enough at the scanner's life cycle to determine where we're best off fixing this. It's possible this might be addressed in a way that also relates to the larger problem of tests not having a good single source of truth about the intended auth mechanism.

@ghost
Copy link
Author

ghost commented May 21, 2025

In the specific case of the mock server test, it would seem like we could defer creating the scanner until we know the first URI.

If this behavior is actually desirable more generally, maybe it would make sense to have a mongoc_topology_scanner_add_uri that sets up ts->uri on the first call and for subsequent calls maybe it could confirm that the URIs use compatible settings.

@ghost
Copy link
Author

ghost commented May 21, 2025

Here's a simple fix:

diff --git a/src/libmongoc/tests/test-mongoc-topology-scanner.c b/src/libmongoc/tests/test-mongoc-topology-scanner.c
index f47799ff0..705265f79 100644
--- a/src/libmongoc/tests/test-mongoc-topology-scanner.c
+++ b/src/libmongoc/tests/test-mongoc-topology-scanner.c
@@ -71,25 +71,6 @@ _test_topology_scanner (bool with_ssl)
    mongoc_log_and_monitor_instance_t log_and_monitor;
    mongoc_log_and_monitor_instance_init (&log_and_monitor);

-   mongoc_uri_t *uri = test_framework_get_uri ();
-   mongoc_topology_scanner_t *topology_scanner = mongoc_topology_scanner_new (uri,
-                                                                              &topology_id,
-                                                                              &log_and_monitor,
-                                                                              test_topology_scanner_setup_err_cb,
-                                                                              test_topology_scanner_helper,
-                                                                              &finished,
-                                                                              TIMEOUT);
-   mongoc_uri_destroy (uri);
-
-#ifdef MONGOC_ENABLE_SSL
-   if (with_ssl) {
-      copt.ca_file = CERT_CA;
-      copt.weak_cert_validation = 1;
-
-      mongoc_topology_scanner_set_ssl_opts (topology_scanner, &copt);
-   }
-#endif
-
    for (i = 0; i < NSERVERS; i++) {
       /* use max wire versions just to distinguish among responses */
       servers[i] = mock_server_with_auto_hello (i + WIRE_VERSION_MIN);
@@ -105,7 +86,25 @@ _test_topology_scanner (bool with_ssl)
 #endif

       mock_server_run (servers[i]);
+   }

+   mongoc_topology_scanner_t *topology_scanner = mongoc_topology_scanner_new (mock_server_get_uri(servers[0]),
+                                                                              &topology_id,
+                                                                              &log_and_monitor,
+                                                                              test_topology_scanner_setup_err_cb,
+                                                                              test_topology_scanner_helper,
+                                                                              &finished,
+                                                                              TIMEOUT);
+#ifdef MONGOC_ENABLE_SSL
+   if (with_ssl) {
+      copt.ca_file = CERT_CA;
+      copt.weak_cert_validation = 1;
+
+      mongoc_topology_scanner_set_ssl_opts (topology_scanner, &copt);
+   }
+#endif
+
+   for (i = 0; i < NSERVERS; i++) {
       mongoc_topology_scanner_add (
          topology_scanner, mongoc_uri_get_hosts (mock_server_get_uri (servers[i])), (uint32_t) i, false);
    }

@ghost ghost closed this by deleting the head repository May 21, 2025
@kevinAlbs
Copy link
Collaborator

This branch is pushed to https://github.com/mongodb/mongo-c-driver/tree/OIDC-CDRIVER-4689. The branch was added to the Restrict Updates (Branches) ruleset. Work for OIDC is pausing and is planned later.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants